Speaker recognition using kernel-PCA and intersession variability modeling
نویسنده
چکیده
This paper presents a new method for text independent speaker recognition. We embed both training and test sessions into a session space. The session space is a direct sum of a common-speaker subspace and a speaker-unique subspace. The common-speaker subspace is Euclidean and is spanned by a set of reference sessions. Kernel-PCA is used to explicitly embed sessions into the common-speaker subspace. The common-speaker subspace typically captures attributes that are common to many speakers. The speaker-unique subspace is the orthogonal complement of the commonspeaker subspace and typically captures attributes that are speaker unique. We model intersession variability in the common-speaker subspace, and combine it with the information that exists in the speaker-unique subspace. Our suggested framework leads to a 43.5% reduction in error rate compared to a Gaussian Mixture Model (GMM) baseline.
منابع مشابه
Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification
This paper presents a new speaker verification system architecture based on Joint Factor Analysis (JFA) as feature extractor. In this modeling, the JFA is used to define a new low-dimensional space named the total variability factor space, instead of both channel and speaker variability spaces for the classical JFA. The main contribution in this approach, is the use of the cosine kernel in the ...
متن کاملTrainable speaker diarization
This paper presents a novel framework for speaker diarization. We explicitly model intra-speaker inter-segment variability using a speaker-labeled training corpus and use this modeling to assess the speaker similarity between speech segments. Modeling is done by embedding segments into a segment-space using kernel-PCA, followed by explicit modeling of speaker variability in the segment-space. O...
متن کاملApplication of speaker- and language identification state-of-the-art techniques for emotion recognition
This paper describes our efforts of transferring feature extraction and statistical modeling techniques from the fields of speaker and language identification to the related field of emotion recognition. We give detailed insight to our acoustic and prosodic feature extraction and show how to apply Gaussian Mixture Modeling techniques on top of it. We focus on different flavors of Gaussian Mixtu...
متن کاملDoes session variability compensation in speaker recognition model intrinsic variation under mismatched conditions?
Intersession variability (ISV) compensation in speaker recognition is well studied with respect to extrinsic variation, but little is known about its ability to model intrinsic variation. We find that ISV compensation is remarkably successful on a corpus of intrinsic variation that is highly controlled for channel (a dominant component of ISV). The results are particularly surprising because th...
متن کاملIntersession Compensation and Scoring Methods in the i-vectors Space for Speaker Recognition
The total variability factor space in speaker verification system architecture based on Factor Analysis (FA) has greatly improved speaker recognition performances. Carrying out channel compensation in a low dimensional total factor space, rather than in the GMM supervector space, allows for the application of new techniques. We propose here new intersession compensation and scoring methods. Fur...
متن کامل